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Efficient decoding algorithm using triangularity 
of R matrix of QR-decomposition 

In Sook Park 

Abstract 

An efficient decoding algorithm named 'divided decoder' is proposed in this paper. Divided decoding 
can be combined with any decoder using QR-decomposition and offers different pairs of performance 
and complexity. Divided decoding provides various combinations of two or more different searching 
algorithms. Hence it makes flexibility in error rate and complexity for the algorithms using it. We 
calculate diversity orders and upper bounds of error rates for typical models when these models are 
solved by divided decodings with sphere decoder, and discuss about the effects of divided decoding 
on complexity. Simulation results of divided decodings combined with a sphere decoder according to 
different splitting indices correspond to the theoretical analysis. 

Index Terms 

multiple-input multiple-output(MIMO) channels, Near maximum likelihood, MIMO detection, sphere 
decoder, lattice reduction. 

I. Introduction 

To obtain high data rate and spectral efficiency, communication systems require a detector 
the error rate of which is as close to that of the maximum likelihood (ML) solution as possible 
with a tolerable complexity. In most cases the additive noise vector is assumed to be Gaussian 
with mean zero-vector and detecting original signal from a received signal turns into solving an 
integer least-squares problem. This paper proposes a method solving the integer least-squares 

The author is with the BK Institute of Information and Technology, Division of Electrical Engineering, Department of Electrical 
Engineering and Computer Science, KAIST, Daejeon, Korea [e-mail: ispark@amath.kaist.ac.kr; ispark@kaist.ac.kr]. 



January 22, 2009 



DRAFT 



2 

problem which is finding s such that 

s = min llx — Hsll 2 (1) 

seD 

where D is a set of n-dimensional complex vectors whose real and imaginary parts are integers 
(or discrete numbers), x is an m-dimensional complex vector, and H is an m x n complex matrix. 
The exact solution of CQ) is ML solution when x — Hs is an m x 1 Gaussian random vector whose 
mean is mx i. The brute-force search visits all the points of D, which makes the complexity 
grow exponentially in n. Sphere decoding (SD) JTJl, 0, 0, flU, 0, a depth first tree search 
within a sphere which can shrink with each new candidate during search process, is known to 
find the exact solution of CQ) but reduce considerably the complexity so that it finds very often 
the solution within real time when the brute-force search can not. The efficient search strategies 
0, 0, are employed by both real and complex sphere decoders [9J. Usually before starting 
search process SD calculates the initial radius but, as noted in 0, when Schnorr-Euchner 
strategy is used the radius of the Babai point [3 J is enough for good start of search and the time 
required for the initial radius estimation is saved. The expected complexity of SD is known to 
be approximately polynomial for a wide range of signal-to-noise ratios (SNRs) and numbers of 
antennas ifTOl . |fTT|. But it still depends on SNR and has more portion of high ordered terms in 
the dimension of the vector in search. 

Algorithms finding near ML solutions with the advantage of complexity reduction have been 
suggested for recent decades. Among them, the M-algorithm combined with QR-decomposition 
(QRD-M) (H21, lfT3lO has performance almost the same as ML when the value of M is not less 
than the constellation number. For fixed M, the computation amount of QRD-M is independent 
of SNR and the condition number of channel matrices, and is polynomial in the dimension of the 
vector to be searched. But, for almost the same performance the expected computation amount 
of SD is much less than that of QRD-M though the maximum computation amount of SD is 
more than two times of the maximum computation amount of QRD-M |T4|. Detection with the 
aid of lattice reduction (LR) is another approach: LR helps SD to reduce the complexity 
when the channel matrix is ill-conditioned and aids linear detection or successive interference 
cancelation (SIC) to have better performances lfT5l . lfT6ll . ifTTl . Though, checking the validity of 
every searched point adds computational load and calculating Log-likelihood ratio (LLR) is still 
burdensome for the LR aided detections. Fixed-complexity sphere decoder lfT8l (FSD) is SD 
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within a subset of the domain to be searched and visits only a fixed number of lattice points. 
FSD with a proper restricted domain has a near ML performance with a fixed complexity for 
each set of m, n and constellation. 

Nulling and cancelling with optimal orderings, i.e. zero-forcing with ordered successive in- 
terference cancellation (ZF-OSIC) and minimum mean square error with ordered successive 
interference cancellation (MMSE-OSIC), lfT9l are sorts of standards and give bases for developing 
advanced decoding algorithms. ZF-OSIC and MMSE-OSIC both are performed efficiently and 
have computation amount reduced by employing QR-decomposition (QRD) or sorted QRD 
(SQRD) ll20ll . Nulling and cancellings and near ML algorithms above perform QRD before 
searching process. (Instead of QRD Cholesky decomposition is frequently used.) In practice, 
ZF-OSIC and MMSE-OSIC are available in error rate sense for higher modulations than QPSK 
when the number of transmit streams is no more than 4. If the number of transmit streams 
is more than 4 with high modulation, decoding algorithms performing in real time with lower 
error rate than nulling and cancellings are required. To support this requirement, we propose a 
simple method called 'divided decoding' which utilizes the properties of the resultant matrices of 
QRD (or Cholesky decomposition) and combines with any given searching algorithms. Divided 
decoding can provide various modifications or combinations of searching algorithms which are 
known or to be appeared. 

The remainder is composed of five sections as follows. In Section UH we describe a basic 
system model to solve. In Section [III] we introduce the idea of divided decoding and the possible 
combination forms of the divided decoding and other algorithms. Section [IV] provides diversity 
orders and upper bounds of the error probabilities for some typical models by summing up 
pairwise error probabilities when the divided decoding is combined with SD, and a discussion 
of complexity reduction effects of the divided decoding. Section [V] presents simulation results 
supporting the analyses in section |IV] by showing the way of transitions of bit error rate (BER) 
and complexity curves versus SNR according to the splitting index set, and compares divided 
decodings based on SD with Lenstra Lenstra and Lovasz (LLL) LR Ell aided SIC's. In Section 
IVT1 there is a conclusion. 
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II. System Model 

An original signal vector s belong to D, a finite subset of an n dimensional lattice, passes 
through a channel and is measured as an m dimensional vector x, then the relation of s and x 
is modeled by 

x = Hs + n (2) 

where H is an m x n channel matrix whose distribution is arbitrary and the elements of n are 
assumed to be independently identically distributed (i.i.d.) circularly symmetric complex normal 
variables with mean zero and variance a 2 . Usually, for q— QAM constellations D is the Cartesian 
product of n copies of q lattice points. (Q~J) is the ML solution of (J2j). (|2j) is transformed to a 
real system, if the decoding algorithm used is based on real number calculations. 

To describe the algorithm we propose, we need the following notation: The sub matrix 
composed of the elements in rows a through b of columns c through d of a matrix A is denoted 
by A [a : b][c : d]. When v is a column vector, the sub-vector composed of the elements in rows 
a through b of v is denoted by v[a : b], 

III. Divided decoding 

A. The Idea of Divided decoding 

First, H is decomposed into QR by QRD where Q is a m x n matrix of orthonormal 
columns which is the first m x n partial matrix of a unitary matrix and R is an n x n upper- 
triangular matrix with non-negative diagonal entries. QR is called the thin factorization of H. 
To improve the performance of the algorithm presented below, either the columns of H are 
reordered in increasing order of the Euclidean norm before QRD or H is decomposed by sorted 
QRD (SQRD) which is a QRD intervened by sorting process of columns. SQRD is found in ll20ll . 
SQRD is more effective for performance improvement and we use SQRD in the followings. We 
let y = Q*x and z = Q*n where Q* is the conjugate transpose of Q. Then is reformulated 
as 

y = Rs + z (3) 

where z is statistically equivalent to n i.e. the elements of z are i.i.d. circularly symmetric 
complex normal variables with mean zero and variance a 2 . For any 1 < i,j < n, the inner 
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product of ith and jth columns of R is equal to the inner product of ith and jth columns of H. 
Hence the SNR for each symbol of s is unchanged. 

The simplest version of divided decoding is as follows: i) For any i (l < i < n), let © be 
split into 

Sl 



yi = Ri 



S2 



+ zi, y 2 = R 2 s 2 + z 2 



(4) 



where Ri = R[l : i ][l : n],yi = y[l : i ],Si = s[l : z ],zi = z[l : z ],R 2 = R[«o + 1 : 
n][i Q + 1 : n],y 2 = y[i + 1 : n],s 2 = s[z + 1 : n], and z 2 = z[z + 1 : n\. First, find s 2 
minimizing ||y 2 — R 2 s 2 || 2 by applying one of SD, M- algorithm and other near ML algorithms. 
Let s 2 denote this point and calculate yi = yi — Ri[l : *o][*o + 1 : n]s 2 . Secondly, find Si, 



denoted by s 1; minimizing 
other near ML algorithms. 



yi_— Ri[l : z ][l : «o] s i|| 2 by applying one of SD, M-algorithm and 

Sl 



s 2 



is an approximate solution of CD- 

Method (i) is extended as follows: ii) © is split into more than two equations. Given 

i , ii, . . . , ik (1 < i < ii < ■ ■ ■ < ik < n), let z_i = 0, ik+i = n and then, for 1 < / < k + 2, 
let R f = R[if_ 2 + 1 : i/-i][«/-2 + l-n],y f = y[i f _ 2 + 1 : if-i],* f = s[i/_ 2 + 1 : V-i],z/ = 
z[ij_ 2 + 1 : Then © is split into k + 2 equations as follows: for 1 < / < k + 2 



y/ 



s .f 



Sfc+2 



+ Z/ . 



We find s k+2 , denoted by s fe+2 , minimizing ||yfc +2 — Rfc +2 Sfc +2 || 2 . Starting from / = k 



(5) 



1, 



compute y f = y f -R/[l : if-i — i/_ 2 ][i/_i -if-2 + 1 : n — if- 2 
Sf minimizing ||y/ — R/[l : if-i — V- 2 ][l '■ i ' '" 12 



Sfe+2 



and detect 



Z/_ 2 JS/||' repeatedly with decreasing / one 

T 



Sl 



Sfe+2 



is an approximate 



by one until / = 1. Consequently, we obtain Si, . . . , Sfc +2 . 
solution of CQ). 

If i = 1, h = 2, . . . , ik=n-2 = n - 1 (i = 2, i x = 4, . . . , i k=n/2 -2 = n - 2 if © is a real 
version of the original complex system) then the above method is the same as ZF-OSIC. As 
the number of split equations is increasing, the computation amount decreases but the error rate 



increases. 
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B. Divided decoding with Quasi MMSE extension 

As described in [[20], the MMSE filter output s MMSE is reformulated by 



where H and x are 



We can reconstruct an extended system of © as follows: 

x = Hs + n 

n 



&MMSE 


= (H*H) 1 


H*x 




H 




X 


H = 




and x = 






crl n 




Onxl 



(6) 



(7) 



(8) 



where n 

-as 

it as a noise vector. 



and n is assumed to be a Gaussian noise vector. We ignore —as and regard 
Instead of H, perform SQRD on H to obtain H = QR and multiply ® by Q* to obtain 



y = Rs + z 



(9) 



where y = Q*x and z = Q*n. If we search % = min seD ||y — Rs|| 2 by SD then s M is a near 
ML solution which has almost negligible performance loss in comparison with ML solution. 
Quasi MMSE extension is a generalization of MMSE extension as follows (|22|: 

H 

eal„ 



s + 



n 

— eas 



(10) 



where e is a positive real number. Let H e 



H 

eal r 



and s e = min sg£) ||x — H e s|| 2 , then 



s e= i.o = sm- The performances of s e for several e's and the effects of Quasi MMSE extension 
on closest point search in complexity are described in [l22l . When e = -^=, 4= the performance 
of s e for low SNR range is better than s (ML solution) but the complexity required to find s e 
by using SD is far lower than that to find s. This scenario is expected to be right for other e's 
between and 1.0. For < e < v2 s e has almost the same BER with s, and as e increases 
within at least the computation amount decreases. 
By SQRD on H £ we obtain H e = Q e R e and get 



R e s + z e 



(11) 
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. Although z e contains the unknown signal s, z e is 

assumed to be a Gaussian noise vector with ecrs ignored. s e = min sg £) ||y e — R e s|| 2 and s e can 
be found by SD. (fTT|) can be divided in the same way as © and approximate solutions to s e 
can be obtained by searching all the sub-vectors. 

C. Hybrid Algorithms via Divided decoding 

Various combinations of two or more detection algorithms can be employed to find solutions 
after splitting equations ©, ©, (fTTI) into the form of ©. For example, if starting from © firstly 
find s 2 by SD and cancel s 2 from y 1 by calculating y 1 = y 1 — Ri[l : io][io + 1 : n}s 2 . Then find 
Si by SIC. Since SINR of s 2 is roughly no less than that of si by column reordering, this hybrid 
algorithm reduces the error propagation against the pure SIC and reduces the complexity against 
SD. This combination is in fact the same with the case of finding each sub-vector solution by 
SD from © with j = 1, j x = 2, . . . , ji -2 = io — l>ji -i = *o- Instead of SD and SIC, another 
combination like M-algorithm and SIC, SD and M-algorithm, or fixed-complexity SD and SIC 
can be applied. 

IV. Error probability and Complexity 

A. Error probability 

It is well-known that MMSE-SIC or MMSE-OSIC, which is the original version and not the 
modified version of back substitution via transforming the channel matrix into a triangular one, 
can achieve the capacity of a given system [|23l . Back substitution after MMSE-SQRD or SQRD 
of H and multiplying Q* or Q* can not avoid some information loss due to ignoring strictly 
upper triangular part at each decision step and fails to achieve the capacity of the system. But 
the difference presented in BER curves of the former and the latter is small, because the degree 
of freedom at each step of decision which is related to the diversity order is an important factor 
of the error rate and the two have the same degree of freedom at each decision. 

Divided decoding with nontrivial split can not achieve the capacity of a given system. Even 
in the case that the search algorithm for each sub-vector has ML performance, divided decoding 



where y e 



Q*x and z e 



q: 



n 



-ecrs 
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with nontrivial split has information loss. The total achievable rate of method (ii) is 

k+2 



C d = P R [ !og 2 det(l nf + —R f P f fi* f ) 



(12) 



/=i 



where P f is the covariance matrix of S/, nf = if_i—if_ 2 , and Rj = R[2j_ 2 + 1 : i /— 1] [i /— 2 + 1 : 
if-i\. There is information loss related to R[l : i/_ 2 ][i/- 2 + 1 : if-i]- Here £ar[-] denotes the 
expectation over R. 

An upper bound of the error probability of a system can be obtained via the union bound of 
each pairwise probability, i.e. the average error rate of (OQ) is 



P < E 



seD 



Pis 



(13) 



~ t2 ' 2 dt 



(14) 



E S&D [-} denotes the expectation over s and P(s — > s') the probability that s is mistaken for a 
different vector s . For each fixed (or estimated at the receiver) H 

P(s-s') = -= / 

V27T y v /||H(s- B ')|P/(2<r 2 ) 

when we use a detector finding the ML solution. We let 0(a) := -k= f°° e~ t2 ^ 2 dt. If we use the 

b V 7 V27T Jq 

divided decoding which splitting ([3]) into the form © with > then for each sub-vector s/ 

the pairwise probability P(s^ — > s^) is calculated as follows: for f = k + 2, P(s k+2 — > s 'k+2) = 
^/ ||R fc+2 (s fc +^- s ; +2 )|p y and for y < fc + 2 , 

-P(S/ — > Sy) = P(s/ — > Sj|sfc + 2 = Sfc + 2, Sfc + i = Sfe+l, . . . , S/ + i = S/ + i)P(Sfc + 2 = Sfc + 2, Sfc + i = 



Sfc+l, • • • , S/ + i — S/ +i/ 



P s 



{ Sfc+2 — Sfc+2, Sfc + i — Sjfe+i, . . . , S/4-1 — S/_|_i}' 



X P ({ s fc +2 — Sfc+2, — Sfc + i 



We have 



P(s/ -> s^|sfc + 2 = Sfc +2 ,Sfc+i = sjfc + i,...,s /+ i = s/ +i ) = Q ( V I|R/(s 2 j ct2 S/) 



and 



( ( s / - s f I { s fc+2 — s fc+2j s fc+i — s fc+i) • • • > s /+i — s /+i} c ) ) — ^ 



|Rf( s /- s /) 
2ct 2 



because the middle of the distribution of y/, which is the mean of y/, under the condition 



that s/ + i 
inequality. 



Sfc+2 



V+i 



Sfc+2 



is not R/Sj. Thus, we have the following 
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Proposition 1: For fixed H, the error probability P err for the divided decoding © satisfies 
that 

k+2 

Perr < £ E {s} . eDf} [E^^s^} O M 

where is the n/ dimensional subset of .D. 

Proof: P err < E/=i ^err,/ where Perr,/ is me error probability in searching s/. And, P err j < 



E{B f eD,} 



by the above argument. □ 
To find the average error probability or its bound, when the channel matrix is not fixed but has 
some specific properties, we need the following lemma. 

Lemma 1: Let Hbeanmxn(m>n) random matrix with independently distributed columns 
such that each column has a distribution that is rotationally invariant from the left i.e. for any 
mxm unitary matrix © the distribution of ith column, H[l : m] [i : i], is equal to the distribution 
of @H[1 : m\[i : i\. Then Q and R, which constitute a thin QR decomposition H = QR with 
the diagonal entries of R non-negative, satisfy the following: 

1) Q and R are independent random matrices. 

2) The distribution of Q is invariant under left-multiplication by any mxm unitary matrix, 
i.e., Q has an isotropic distribution. 

3) Considering the split form © and the notation of Rf = R[z j_ 2 + 1 '■ i [i /-2 + 1 ■ i 

for each 1 </<&; + 2, R/ has the same distribution as the upper triangular matrix 
obtained from the QRD of H/ and Ry-R/ has the same distribution as H^H/ where 
H f = H[z/_ 2 + 1 : m][if- 2 + 1 : if-i]- i- e - 

RiRi = H[l : m][l : i ]*H[l : m][l : i ] 

R^Ra = H[t + 1 : m][io + 1 : n]*H[i + 1 : m][z + 1 : 

(16) 

R}R/ = H[i/_ 2 + 1 : m][i/_ 2 + 1 : v_i]*H[ V _ 2 + 1 : m][i/_ 2 + 1 : 



R^ +2 R fc+2 = H[i k + 1 : m][zjfc + 1 : n]*H[i fc + 1 : m][z' fc + 1 : n] 
where A = B denotes that A has the same distribution as B. 
Proof: The proof of this lemma stems from the proof of Lemma 1 of [10J and the results 
of |[24|. and to prove item 3) we add some process and statements. 
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Q is the partial matrix composed of the first n columns of an m x m unitary matrix Q where 
H = QoR is a full version of QRD of H. Q = Qo[l : m][l : n]. Qo and R are independent and 
Qo is isotropically distributed, by Lemma 1 of IfTOll . Thus 1) and 2) are immediately followed. 

Since the columns of H are independent, the probability that H has full column rank is 1. 
The columns of any sub-matrix G of H are independent and G has full column rank with 
probability 1. Therefore, the upper triangular matrix with nonnegative diagonal entries which 
constitutes QRD of G is unique and the thin QRD of G with the diagonal entries of the upper 
triangular matrix nonnegative is unique, where {G} 9 H. From now on the diagonal entries of 
the triangular matrix of a QRD are non-negative. Let Hi = H[l : m][l : zo] be QR decomposed 
as 

Ti 







Hi = Qi 

where Qi is m x m unitary and T x i x i upper triangular. Applying QI to the full H we have 

Q*H 



Ti 





Ax 
Hx 



where 



Ai 
Hi 



Q*H[1 :m][i + l:n]. 



Ax 
Hi 



Ai 
Hi 



is independent of Q x and 



H[l : m][i + 1 : n] 



by the rotational invariance of the columns of H. Thus Hx = H[z + 1 : m][io + 1 : n] and 
Hi[l : m — zq][1 : n 2 ] — H 2 , recalling n/ = i/_i — if-%. Let Hi[l : m — z ] [1 : be QR 
decomposed as 

't 2 



H x [l : m- i ][l : n 2 \ = Q 2 







where Q 2 is (m — i ) x (m — i ) unitary and T 2 n 2 x n 2 upper triangular. Then we have 

T 2 A 2 
H 2 



Q 2 Hi = Q 2 
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where 



A 2 
H 2 



Q 2 Hi[l : m — io][n 2 + 1 : n — i ] 



— Hi [1 : to - i ] [n 2 + 1 : n - i ] . 

Hence H 2 = H[^ + 1 : to][^ + 1 : n] and H 2 [l : to - i x ] [1 : n 3 ] = H 3 . For 3 < / < k + 2, 
H/-i = H[i/_ 2 + l : m][i/_ 2 + l : n] and H/_i[l : m-i/_ 2 ][l : n/] = H/. H/-i[l : m-i/_ 2 ][l : 
n/] is QR decomposed as 



where Qj is (to — i/- 2 ) x (to — i/_ 2 ) unitary and Ty is nj x n/ upper triangular. Now, we have 
Ti Ai 



H/_i[l : to- z/_ 2 ][l : n f ] = Q/ 



H = Qi 



Qi 



Hx 

I m 
Q 2 



Ti Ai 

T 2 A 2 
H 2 







Qi 



Inx 
Q 2 



■■ni 









"-712 








We have, with probability 1, 



R 



Ti 











k +1 o 
Q fc+2 



Ax 
T 2 A 2 



Ti 





Ax 
T 2 A 2 

T fc+2 




T 



k+2 



(17) 



and R/ = Tf for all 1 < / < k + 2. By the rotational invariance, this concludes the third 
statement. □ 
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Even when sorting columns intervenes during QR-decomposition, Lemma [TJ is verified. Now, if 
H is a random matrix satisfying the condition of Lemma [1} we have i? H 

and the following result. 



Q f / l|R/(g/-4) 



2a 2 



H 



Q ,/ l|H/(./- f ) 



Theorem 1: If random matrix H is under the condition of Lemma Q] then the average error 
probability P err for the divided decoding © satisfies 



Perr < E U E {sfeDf} >f} O (V^^ 



fc+2 



h ; := wec(H / )t| 



E { s /e D /} p{B^ei>/,8^ / } £ 'H / 



f J llH/Csz-s^ll 



(18) 



and if h f := vec(Hf) L_| has a multi-dimensional complex normal distribution with mean and 
covariance matrix Yj, i.e. hj ~ N c (0, Tf), then 

-l 



Eh, 



< |I(m-i (/ _ 2) ) + ^( S / - S /) T /( S / - S /)* 



(19) 



where S f : = sj <g> I (m _ i(/ _ 2)) , := (s^) T <g> I (m _ i(/ _ 2)) , := rank{(S/ - S' / )T / (S / - S})*}, 
{£(/,!), . . . , €(f,k f )} ^ tne nonzero eigenvalues of (Sf — Sj)Yj(S/ — S^)*, (-) T denotes the 
transpose, cg> means the Kronecker product, and | • | the determinant of a matrix. 
Proof: First, (fT8l is proved as follows: 

P^<E„ T^E {sfeDf} [E {W/ , /} Q (V "^^ " 2 )]] (by Proposition^]) 

^fc+2 ' ' 



fc+2 



/=1 



Secondly, adopting the approach of [25J, by the Chernoff bound we have £2 (y ^Z^ST ) < 
Hf(s/ ~ S/) " 2 ^ - exp ( - " (Sf ~y h/ " 2 ) . The covariance matrix of (S/ - S^h/ is (S/ - 



cxp 



'wec(H) ofmxn matrix H is defined as 



H[=][l] 
H[:][n] 



where H[:][i] is the i-th column of H. 
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S f )T f (S f 



and 



S/)h/|| 2 



f) allu II v*f 

density function p x ^{x) of xf is 7^— exp ( — j 2 —). Hence we get 



Ei=i xh where {xf^ii are independent and the 



E 



H ( 



l H /( s /- s /0 

2a 2 



exp 



(Sf-S f )h f \\* 
4^ 



00 k f 



n 



„ - ,(k exp (-* 
u 1=1 



Hf,i) 



n(^r^ eX p(-i 



2 

4^ 



8=1 
k f 



(20) 



i=i 



V4a 2 



-1 



I fc/ + diag(e^^) } . . . , e^*^ 
I(m-i (/ _ 2) ) + ^( S / - S /) T /( S / - S j 



-1 



Since 



the second inequality of (TT9b is obviously true. 



□ 



Proposition \T\ and Theorem Q] can be generalized when we use a divided decoding to find 
n x r matrix X from m x r matrix Y such that 



Y = HX + E, 



(21) 



where the entries of E are independent complex Gaussian random variables with mean zero and 
variance a 2 . Let X/ := X[z/_ 2 + 1 : : r], © be the domain that X belongs to, D/ the 

domain that Xj belongs to. 

Proposition 2: For fixed H, the error probability P err in detecting X by using divided decod- 
ing with each sub-matrix X/ found by a detector searching ML point satisfies that 



k+2 



(22) 



/=1 
^+2 



Proof: P eTr < E/=i Perr j where Perr,/ is the error probability in searching X/. And the 
remainder is similar to that of Proposition [TJ □ 
Theorem 2: If random matrix H is under the condition of Lemma Q] then the average error 
probability P err for the divided decoding © satisfies 



k+2 



Perr < X^X /6 B /} [E^B,,^} ^ [fl (^ Jg^gll ) 



(23) 



/=1 
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and if h f = -uec(Hj) ~ N c (0, Ty) then 



< 



i 



r(m-i (/ _ 2) ) I" 4cj . 2 



(/-2) 



jTyKXy-X'f®! 



(m— i 



(/-2)-i 



(24) 



< 



i=l 



4a 2 



where fc/ := rank{[(X/-X^)- ix)i.( m -» (/ _ 2) 



e (M?)} 



I (^-H/-2))] T /[( x /- x /) T ® I ( m -i (/ _ 2) )]*}, and {e (/j i), 
are the nonzero eigenvalues of [(X/ - X^) T ® I (m _ i(/ _ 2)) ]T / [(X / - X_' f ) T ® I (m _ i(/ _ 2)) ]*. 

Proof: The proof is a simple extension of the proof of Theorem [IJ □ 
If we assume uec(H) ~ iVc(0, p 2 I mn ), then Y/ = p 2 I rf/ where d f = (m — «(/_ 2 ))(«(/-i) — «(/-2)) 
and we have 



-Eh, 



2a 2 



< |i r(m _ i(/ _ 2)) + ^_[(x / -x;) T ®i 



Ir + 



4a 2 
P 2 



4a 2 

;x / -x / r(x / - x 



™-«(/-2)J- 1 -'i/ 



^.[(Xy-X.f®! 



1-1 



L m-i (/ _ 2) J 



' N|(- m +*(/-2)) 



(25) 



- |l r + ^(Xy - X' f )(X f ~ X ;)f- m+ ^- 2)) 

<|(x / -x^)(x / -x' / )f-^^.(^)- ,,/ . 

/ 2 \ 

Hence, we get P err ,f < ( ^2 ) ■ Gf where 



Gf — E{x. f ei 



E{x^eD / ,x^#x / } K X / - X /)( X f - X 



' \*\(- m +Hf-2)) 

f> I 



and the diversity order of P err j is df. The diversity order of P err = YLftX P&rrj is a combination 

of {d f } k f tl 

When © (or (fTTT) . more generally (I2TI) ) is split according to both {i ,...,i k } and {j , . . . , 
and all sub-vectors detected by a ML decoder, for example SD; even if the set of the sub-vector 
sizes are equal i.e. — V-2)}/=i = {07- 1 ~~ J/-2)}/=i» me diversity-orders and error- 

rates of the two are different and significantly different for many cases. On the other hand, the 
complexities of the two are not so different, which will be explained with simulation results in 
the next section. 
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Example 1: Consider the example of {i = 1} and {j = n—1}, where vec(H) ~ N c (0, p 2 I m n) 
and the sets of sub-vector sizes of these two are equal to {l,n — 1}. But then we have 

/ p 2 \- m ^ f . ^ ( p 2 \ -(m-l)(n-l) 



Perr(Oo = 1}) < ^(zq) ^ ^ , } 



and 



where 



Perr({jo 



n 



- 1} < g;o.) ■ (£5) 



+ G 2 (i 

2 \ — m(n— 1) 



4^ 



+ 



m+n— 1 



Gi(io) 


= 


£{.; 




(si 


- s 'l)( s l 




— m 


? 




G 2 (*o) 


= ^{s 2 } 








-s 2 )(s 2 




-771+1 


) 






£{4 




(s 2 


- S 2 )( S 2 


-s' 2 )* 


— m 


? 




g;oo) 


= E {S1} 


£{.; 




(si 


- S 'l)( S l 


-s;r 


— m+n- 


-l" 



Si = s[l : l],s 2 = s[2 : n}. 
This example shows that P er r({io}) has larger diversity order and is at the same time much 
lower than P err ({jo}) if m,n > 2. 

Example Q] is a simplest comparison, whose generalized version can be obtained for the pair of 
{io = 1} and {jo = n — 1} and more expansively for a class of sets of the form {i , . . . ,ik} 
whose resultant sets of sub-vector sizes are identical. From this reasoning we have the following 
conjecture. 

Conjecture 1: If s, %, or s e is approximated by divided decoding with SD according to 
splitting index set {io,i\, ■ ■ ■ ,ik} (1 < io < h < • • • < ik < n ) whose sub-vector size set is 
fixed as {nfy^ x , rif = if_i — if -2, then the index set {io,ii, . . . ,iu\ letting {nf} be n x < n 2 < 
■ ■ • < nk + 2 is the best choice, i.e. it makes the error rate and the complexity least at the same 
time. 

The reasoning of this choice letting the complexity least under fixed {ri/}^ is that the error 
propagation from the sub-vectors previously found is least at each step of searching a present 
sub-vector solution by SD and the complexity of SD depends on SNR and the sub-vector size. 

B. complexity 

To see roughly the gain in complexity; if we use the full search algorithm then the number 
of multiplications required for the computation except QRD is 2n(n + 2>)q n for q— QAM con- 
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stellation, but if we apply a divided decoding which splits a signal vector into k ones of equal 
size and detects each sub-vector by full search then the number of multiplications required is 
2n((| + 3)g n//fc + — 1)). If we apply © with full search then the number of multiplications 
required is 2{n — io)(n — io + 3)q n ~ l ° + 4i (n — io) + 2i (io + 3)q l °. The exponent of q depends 
on the sub-vector sizes. After QRD, if the mother search algorithm's complexity is f(n) and 
depends only on n then the complexity of the divided decoding with k splits of equal size is 
kf(n/k) + 2n 2 (k — l)/k (kf(n/k) + n 2 {k — I) /(2k) for real systems) and the complexity of 
applying © is f(i ) + f(n - i Q ) + 4i (n - i„) (/(io) + f(n - i ) + io(n - i ) for real systems). 
kf(n/k) and /(io) + f( n ~ io) multiplications are required for search, and 2n 2 {k — l)/k and 
4i (n — io) multiplications are for cancelling. 

If a given search algorithm after QRD has its complexity f(n) only dependent on the size n 
of the vector searched then the complexity of divided decoding based on the search algorithm 
is obtained by simple calculation as follows: 

Proposition 3: The complexity of divided decoding according to splitting index set {i , i 1; . . . , i k } 
with sub-vector size set {n^t 2 , Uj = ij^-ij^, is fM+A ([n^tl) where A (in^tl) = 
4Ej=2 n 3*3-2 f° r complex systems and A \ (nj)*=%\ = Sj=2 n i*i-2 for real systems, and in 
most cases A ((n^jjj oc Y^2 n iH-^- 

If the complexity of a given search algorithm A after QRD depends on the statistical property 
of H, SNR, m, n and particularly depends on {n, m, a 2 } i.e. / = f(n, m, a 2 ) then we have the 
following formula: 

Theorem 3: If random matrix H is under the condition of Lemma \T\ then the complexity of 
divided decoding according to splitting index set {io, ii, . . . , ik}, fd(n, m, a 2 ), is fd(n, m, cr 2 ) = 

Yl;tl^{n v m-^^a 2 ) + A( { n 3 ))ti)■ 

Proof: For each j,l<j<k + 2, divided decoding based on the search algorithm A finds 

Sj from the following equation 

The elements of Zj are i.i.d. with circularly symmetric complex normal variables with mean 
zero and variance cr 2 . R, has the same distribution as the upper triangular matrix obtained from 
the QRD of Hj = H[i,_ 2 + 1 : m }Vj~2 + 1 : ij-i) from Lemma [TJ Therefore the complexity 
required for finding s, is f(nj,m — ij^-,^ 2 ). By summing up f(rij,m — ij-2,a 2 ) over j and 
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A (K)& 2 ) ■ fd{n, m, a 2 ) = £™ /(n,, m - ij_ 3 , a 2 ) + A ((n^) • □ 
The expected complexity for SD of Finke and Pohst under Rayleigh channel estimated in ifTOl 
depends on {n, m, a 2 }. (We omit the constellation number which also have an effect on the 
expected complexity since we only focus on the alterations and effects via divided decoding.) 
From the estimated formula the expected complexity of SD is more dependent on n than to. The 
expected complexity of SD with Schnorr-Euchner's strategy is known to be less than Finke and 
Pohst's in practical experiment because the Schnorr-Euchner's starts with closer point to the ML 
point. The sphere radius determined by the first point (which is the ZF-SIC solution) of Schnorr- 
Euchner's search is efficient because it does not need any extra calculation. The estimation in IfTOl 
is an upper bound of the expected complexities for SD of Schnorr-Euchner and other advanced 
SD's. The expected complexity calculated in ifTOll is a summation of terms taking the form of a 
combinatorial number multiplied by 7(0, (to — n + k)/2) = t( m ~ n+k ^ 2 ~ 1 e~ t dt/T((m — n + 
k)/2) where k varies from 1 to n and a depends on to and SNR. The expected complexity of 
SD grows exponentially in n but the formula proposed in IfTOl describes that the complexity is 
approximately cubic in n for mid to high SNR and some range of to and n. It is hard to find the 
form of the largest value of a such that 7(0, (to — n+k)/2) decreases as (m—n+ k)/2) increases 
and can be ignored for (to — n + k)/2) > k for a proper value k . But, we can find out roughly 
the behavior of 7(0, (m — n + k)/2) as follows: as shown in Fig[H when a = (m — n + k)/2 — 1 
then the value of 7(0, (m — n+k)/2) increases as (m—n+k)/2) does, 7(0, (to— n+k)/2) > 0.1 
and can not be ignored. But when a = |((to — n + k)/2 — 1) then 7(0, (to — n + k)/2) decreases 
as (m — n + k)/2) increases for (m — n + k)/2) > 2 and 7(0, (m — n + k)/2) < 0.05 for 
(to — n + k)/2 > 4. a is proportional to m/(l + • SNR) for some constant (3, and the number 
of constituent terms of the expected complexity strongly depends on n. The dependency of SD's 
complexity on to is much less than the size n of the vector searched. For mid to low SNR range, 
the slope of complexity versus SNR is very steep for n > 4 and increases as n does. Divided 
decoding mitigates the slope increase since the combinatorial terms are summed up only within 
the sizes of sub-vectors to obtain the complexity. 

V. Simulation Results 

We generate H so that the entries of it have i.i.d. circularly symmetric complex normal 
distributions with mean zero and variance 1.0. The number of new generations of H is 1000 
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and each generated H remains fixed during 100 symbol times. The transmit data is spatially 
multiplexed with n (n — 8) streams and the modulation employed is 16QAM. The entries of 
n are generated to be i.i.d. circularly symmetric complex normal distributions with mean zero 
and variance a 2 , where a 2 = 2 -sNmo g2 16 ' m = ^> SNR = E b /N . E b denotes the average 
energy per bit arriving at the receiver. We compare the BER curves of some typical cases and 
their complexities at once. The algorithm finding each sub-vector is SD and the enumeration 
method used in SD is the Schnorr-Euchner's. The complexity is computed by the number 
of multiplications required to find solution except QRD. Fig|2] and Figf3] show the BER and 
complexity curves versus SNR of Example!!] Obviously P err ({l}) < P err ({7}). G 2 (7) 



is the dominant term in P err ({7}) and the slope of log(P err ({l})) is much larger than that of 
log(P eTT ({7})). The complexity difference between the two cases is small but as predicted in 
Conjectured] the complexity of {i = 1} case is slightly less than that of {j = 7} case. As noted 
in the previous section, the complexities of {i = 1} case and {j = 7} case take the form of 
f(l,m = 8,a 2 )+f(n-l = 7,m-l = 7, (x 2 )+4x 7 and f(7, 8, a 2 )+/(l, m-n+1 = l,a 2 )+4x7 
respectively. And the complexity is shown to be more dependent on the first factor, the sizes of 
sub-vectors, than the second factor, the number of rows of the sub-matrix of H corresponding to 
each sub-vector; though the effect of the second factor on the slope of BER curve is equivalent 
to the first factor's. Notice that the second factors of / of {i = 1} and {j = 7} cases are 
{8, 7} and {8, 1} respectively and that the first factors are equal to {1,7}. Similar phenomena 
appear for the pairs ({i = 2}, {i = 6}), ({i = 3}, {i = 5}) in Fig JH and Figj5] On the 
other hand, the gap of the BER's and the slopes of BER curves between the two components 
composing pairs ({i = 1}, {i = 7}), ({i = 2},{i = 6}), ({i = 3}, {i = 5}) decreases 
as the index difference between the two decreases, where the index difference is equal to the 
difference of the two sub-vector sizes related to the pair of indices. As for complexity, Conjecture 
[His valid for limited ranges of SNR and the all curves almost coincide at high SNR range. FigJ5] 
shows also that the complexity decreases as the difference of the two sub-vector sizes related to 
{i } decreases. But BER increases for fixed SNR and the slope of BER curve decreases, as i 
increases. 

In Fig|4] and Figj5] we compare also divided decodings according to {io}' s > = 0, 1, . . . , 7 
based on SD with both LLL LR aided ZF-SIC and LLL LR aided SIC applied to the MMSE 
extended system. Divided decoding according to {i = 0} means that the original equation is 




January 22, 2009 



DRAFT 



19 



not split. Although LR aided MMSE SIC has its BER curve very close to that of SD for a 
system with 4-QAM modulation, 4 transmit and 4 receive antennas and still has a close BER 
curve to that of SD for a system with 4-QAM, 6 transmit and 6 receive antennas IfTTl , the 
gap gets bigger as the number of transmit (receive) antennas changes from 4 to 6. And in our 
simulation result with 16QAM, 8 transmit and 8 receive antennas the gap becomes more bigger, 
though the slopes of the BER curves of LR aided SIC's are almost the same to that of SD at a 
mid to high SNR range. Divided decodings according to {i } based on SD for i = 0, 1, 2, 3, 4 
have better performances than those of LLL LR aided SIC's, and at the same time have lower 
computation amounts than LLL LR aided SIC's for SNR's greater than or equal to 14, 8, 5, 2, 
dB respectively. The number of multiplications is counted during lattice reduction process, slicing 
and substitutions for LLL LR aided SIC's, and for fair comparison the multiplications required 
for the first QRD is not counted. For SNR's greater than 12 dB, when the channel is steady for 
more than 10 symbol times then LLL LR aided SIC's are expected more efficient than divided 
decoding according to {i = 4} based on SD because the error rate difference is slight and lattice 
reduction process is not necessary for at least 10 symbol times. For SNR's less than or equal to 
12dB, LR aided SIC's does not improve error rate, compared with SIC's \ If channel varies fast, 
divided decoding according to {i = 4} with SD outperforms LLL aided SIC's in both error rate 
and complexity. Divided decodings according to i = 1, 2, 3 are also outperforming LLL aided 
SIC's for wide ranges of SNR. 

In Figj6]and Figj7]we present the BER and complexity versus SNR of divided decoding based 
on SD (DSD) applied to © with k split of equal size, k = 1,2, 3, 4, 8. LLL LR aided SIC's are 
also compared. DSD with k — 1 is equal to the full SD, and DSD with k = 8 is equal to ZF-SIC. 
When k — 3, the sub-vector sizes are 2.5, 2.5, and 3 where 2.5 means two complex symbols 
and real (or imaginary) part of a symbol. We can see the transition of BER and complexity from 
SD to SIC as k varies from 1 to 8. For k > 2, DSD with k split looks better in complexity than 
LLL LR aided SIC's. The error rates of DSD with k = 2, 3 are near those of LLL LR aided 
SIC's. The decrease in complexity shrinks as k increases because the decrease in sub-vector size 
diminishes. 

2 This claim can be verified in Fig[5] and Fig[8] for ZF-SIC and MMSE-SIC respectively. In these two figures the graphs for 
k = 8 are the same with ZF-SIC and MMSE-SIC respectively. 
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BER curves of DSD applied to ©, which we call DMSD, in FigJS] show that DMSD has 
better performance than DSD for each k except k — 1, when k — 1 the error rates are almost 
the same. The trends of BER increase and complexity decrease for DMSD appeared in Figj8] 
and Fig Irrespectively are similar to DSD, but the increase and decrease rates are smaller than 
those of DSD. Considering both BER and complexity, DMSD with proper choice of k according 
to SNR range is expected to be better than LLL LR aided SIC's. Even DMSD with k = 1 has 
lower complexity than LLL LR aided SIC's for E^/Nq > YldB even when the channel is block 
fading and steady for 10 symbol times. DMSD with k — 1 is better in complexity than LLL LR 
aided SIC's for E b /N > 2dB when the channel is fast fading. DMSD with k > 2 is obviously 
better in complexity than LLL LR aided SIC's, and DMSD with k < 3 is no worse than LLL 
LR aided SIC's in error rate. 

The BER curves in Figf2l Fig|4l Figj6l and FigJH] present the diversity order transitions which 
are analyzed in the former section. The transitions of complexity curves in FigfJl Figj51 Figj7l 
and Figj9] correspond to Theorem |3l to some degree. 

VI. Conclusion 

Divided decoding offers diverse pairs of error rate and complexity for a given mother algorithm 
which has ML performance or near ML performance. Upper bounds of error rates and diversity 
orders of DSD for typical system models are obtained, from which we are assured that in many 
cases splitting the equation in consideration according to ri\ < n 2 < ■ ■ ■ < is a best strategy 
when divided decoding with fixed sub-vector sizes {n/}^ is applied. Divided decoding controls 
the exponent, the number of added terms, or the bases appeared in the calculation of complexity 
and shows the trade-off between error rate and complexity. On the basis of this observation, we 
can design advanced decoding algorithms flexible in complexity and error rate by using divided 
decoding. We observe that DMSD is better than DSD in both error rate and complexity if we 
know SNR. In comparison with LLL LR aided SIC's, DMSD and DSD are outperforming in 
error rate and complexity if the channel varies fast, and still outperforming for wide ranges 
of SNR when the channel changes slow. For further studies, adaptive applications of divided 
decoding to given conditions need to be considered. 
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Fig. 1. The behavior of j(k, k + 1) and j(k/2, k + 1) 




Fig. 2. BER curves of the system with 8 transmit and 8 receive antennas and 16QAM when divided decoding with SD is 
employed according to both {io = 1} and {jo = 7}. 




Fig. 3. The sample means of the number of multiplications required for divided decoding with SD for both {io = 1} and 
{jo = 7}, where the sample size is 100,000 and the system uses 8 transmit and 8 receive antennas and 16QAM. 
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Fig. 4. BER curves of the system with 8 transmit and 8 receive antennas and 16QAM when divided decoding with SD is 
employed according to {io = 0}, {io = 1}, {io = 2}, {io — 3}, {io = 4}, {io = 5}, {io = 6}, and {io = 7}. In addition, for 
more effective comparison the BER curves of SIC with LLL LR applied to H, marked by 'LLL-H', and SIC with LLL LR 
applied to H, marked by 'LLL-Hext', are included. 
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Fig. 5. The sample means of the number of multiplications required for divided decoding with SD according to {io = 0}, {io = 
1}, {io = 2}, {io = 3}, {io = 4}, {io = 5}, {io = 6}, {io = 7} and those for both SIC with LLL LR applied to H and SIC 
with LLL LR applied to H, where the sample size is 100,000 and the system uses 8 transmit and 8 receive antennas and 
16QAM. 
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Fig. 6. BER curves of the system with 8 transmit and 8 receive antennas and 16QAM when DSD with k split sub-systems of 
equal size (except for k = 3) and k = 1, 2, 3, 4, 8 are performed on l[3}. When k = 3 the sub-vector sizes are 2.5, 2.5, 3. For 
more effective comparison the BER curves of both SIC with LLL LR applied to H and SIC with LLL LR applied to H are 
included. 
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Fig. 7. The sample means of the number of multiplications required for DSD with k split sub-systems of equal size for 
k = 1, 2, 4, 8 to find an approximate solution of {3]l and those required for both SIC with LLL LR applied to H and SIC with 
LLL LR applied to H. When k — 3 the sub-vector sizes are 2.5, 2.5, 3. 
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Fig. 8. BER curves of the system with 8 transmit and 8 receive antennas and 16QAM when DSD with k split sub-systems of 
equal size (except for k = 3) and k = 1, 2, 3, 4, 8 are performed on l[9}. When k = 3 the sub-vector sizes are 2.5, 2.5, 3. For 
more effective comparison the BER curves of both SIC with LLL LR applied to H and SIC with LLL LR applied to H are 
included. 
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Fig. 9. The sample means of the number of multiplications required for divided decodings based on SD with k split sub-systems 
of equal size (except for k = 3) for k — 1,2,3,4,8 to find an approximate solution to Sm and those required for both SIC 
with LLL LR applied to H and SIC with LLL LR applied to H. When k = 3 the sub-vector sizes are 2.5, 2.5, 3. 
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